feat: cli OpenAI-compatible API response_format support#884
feat: cli OpenAI-compatible API response_format support#884markstur wants to merge 13 commits intogenerative-computing:mainfrom
response_format support#884Conversation
- Added `JsonSchemaFormat` model to represent JSON schema definitions
- Extended `ResponseFormat` to support `json_schema` type (in addition to existing `text` and `json_object`)
- Used field alias to avoid conflict with Pydantic's `schema` method
- Added `_json_schema_to_pydantic()` utility function to dynamically convert JSON schemas to Pydantic models
- Updated `_build_model_options()` to exclude `response_format` from model options (handled separately)
- Modified `make_chat_endpoint()` to:
- Parse `response_format` from requests
- Convert `json_schema` type to Pydantic models using the utility function
- Detect if the serve function accepts a `format` parameter using `inspect.signature()`
- Pass the generated Pydantic model as `format=` parameter to serve functions that support it
- Handle backward compatibility with serve functions that don't accept `format`
- Added proper error handling for invalid schemas
- Test json_schema format is converted to Pydantic model and passed to serve
- Test json_object format doesn't pass a schema
- Test text format doesn't pass a schema
- Test error handling for missing json_schema field
- Test error handling for invalid JSON schemas
- Test backward compatibility with serve functions without format parameter
- Test optional fields in JSON schemas
When a client sends a request with `response_format.type = "json_schema"`, the server:
1. Extracts the JSON schema from `response_format.json_schema.schema`
2. Dynamically creates a Pydantic model from the schema
3. Passes it as the `format=` parameter to the serve function
4. The serve function can then use this for constrained decoding via Mellea's `instruct()` method
This maps OpenAI's `response_format` API to Mellea's native `format=` parameter for structured output.
Signed-off-by: Mark Sturdevant <[email protected]>
Signed-off-by: Mark Sturdevant <[email protected]>
Signed-off-by: Mark Sturdevant <[email protected]>
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
response_format support
jakelorocco
left a comment
There was a problem hiding this comment.
I have a broader question that is touched on in my comments below: If we trust our backend provider to properly handle our structured output requests, why do we do any validation on our side? (Because module.serve might do something funky?)
* elif/pass not needed. Comment * set finish reason from result Signed-off-by: Mark Sturdevant <[email protected]>
|
OpenAI will return a If we're trying to be OpenAI compatible (which is the intent here) my default response would be to say we should emulate that behaviour -- though I am uncomfortable in some ways with returning syntactically invalid json.. I guess the rationale is 'we returned what we could'. In some ways it comes down to our 'contract' with callers For streaming this is a key difference as we fail validation (after the data has been streamed) We also have the inconsistency between streaming (here) and non-streaming. For the latter we're already returning the same as OpenAI since we missed doing any validation... I'd say to remove the validation -- then we have the same behaviour and it's down to the format parameter and how the backend constrained decoding works. I think this is what @jakelorocco describes I think this also means my early comment about ADDING streaming validatino is something I no longer feel is the right way forward. In fact the opposite -- remove |
|
+1 on emulating openai as faithfully as possible |
* inspect server() to set accepts_format and is_async in make_chat_endpoint instead of on every request * improved support for json_schema that can be converted to pydantic * no output validation. The module is given a format when applicable, but instead of throwing an error if the formatting is incorrect, the results are sent to the client. * consolidated redundant blocks of kwargs by building a dict of kwargs needed. * improved setting of finish_reason * added tests * improved comments/docstrings where clarification needed Signed-off-by: Mark Sturdevant <[email protected]>
|
removed the validation so now it is more consistent with what you might expect from an OpenAI API server that will return what it has even if it is truncated or otherwise broken. |
planetf1
left a comment
There was a problem hiding this comment.
Suggesting some changes. I think the main one is around the enum clash -- the others add value over what was there before, so the comments are more about completeness rather than regression - so I think it would be good to fix up, but minded to approve when you are comfortable with the changes. nothing else is a blocker
Add Pydantic model_validator to ResponseFormat class to validate that json_schema field is provided when type is 'json_schema'. This moves validation from the endpoint handler to the model level, providing earlier error detection and cleaner separation of concerns. Changes: - Import model_validator from pydantic in cli/serve/models.py - Add validate_json_schema_required() validator to ResponseFormat - Update test to expect ValidationError at model instantiation The validator raises ValueError with message "json_schema field is required when type is 'json_schema'" when validation fails. Signed-off-by: Mark Sturdevant <[email protected]> Assisted-by: IBM Bob
- Remove redundant validation check from cli/serve/app.py - Use cast() for type narrowing Signed-off-by: Mark Sturdevant <[email protected]> Assisted-by: IBM Bob
The exception handler at line 178 in cli/serve/app.py now catches both ValueError and TypeError when validating JSON schemas. This handles edge cases where: - Enum values contain unhashable types (lists, dicts) - Pydantic's create_model() receives malformed field definitions Other exceptions (KeyError, AttributeError, RecursionError) are intentionally not caught as they indicate internal bugs in the schema converter, not invalid user input. Signed-off-by: Mark Sturdevant <[email protected]> Assisted-by: IBM Bob
Replace conditional asserts with unconditional checks in test_json_object_format_no_schema and test_text_format_no_schema. The previous pattern 'if "format" in kwargs: assert kwargs["format"] is None' would pass silently if format was absent, failing to verify the intended behavior. Changed to 'assert "format" not in kwargs' to properly test that no format parameter is passed for json_object and text response types. Signed-off-by: Mark Sturdevant <[email protected]> Assisted-by: IBM Bob
Signed-off-by: Mark Sturdevant <[email protected]>
Signed-off-by: Mark Sturdevant <[email protected]> Assisted-by: IBM Bob
… to string Signed-off-by: Mark Sturdevant <[email protected]>
Replace Enum generation with Literal[tuple(values)] to preserve case-variant enum values like ["open", "OPEN"] that were previously collapsing to a single member. Signed-off-by: Mark Sturdevant <[email protected]> Assisted-by: IBM Bob
|
Addressed all the feedback. Thanks for the great feedback! So, the limited finish_reason implementation is not too bad now, but really it should've been the first PR before this and other PRs gradually did bits of it. The fact that we need a home-grown schema_converter is unfortunate, but now we have one and it's gotten better and better in this PR. |
Misc PR
Type of PR
Description
Testing
Attribution